fix(trace): make empty/aborted runs fail loudly instead of silently "running" by burrows99 · Pull Request #7 · burrows99/trace-cli

burrows99 · 2026-06-19T07:27:46Z

Why

From the dashboard screenshot in the last session: a run.chrome session pinned at the RUNNING badge with 0 events, no video, no lineage — and the agent never flagged it, because the CLI told it the run succeeded. Every signal that would say "this trace is broken or incomplete" was a non-blocking warning, a stderr log, or dashboard-only state — never in the --json envelope the agent reads.

This is also a violation of the logger's own contract (src/shared/codes.ts): the stderr log and the envelope diagnostic for one event are supposed to carry the same code, so you can join "what I saw while it ran" to "what landed in the result." In practice the recording/upload failures were log-only and invisible to the result.

Changes

1. Terminal envelope on abort — DynamicCommand

A throw mid-run (attach failed, engine crashed, recording threw) left the initial running partial orphaned in the dashboard forever. The run now emits a terminal envelope via onProgress — no running flag, ok:false, an ENGINE_FATAL diagnostic — so the collector resolves the session. Cli flushes that emit before exiting non-zero.

2. Recording outcomes are envelope diagnostics, not just stderr — DynamicCommand.#record

RECORD_EMPTY (no frames → empty video) and RECORD (render/upload threw) were log.warn/log.error only, so "no video" was invisible to a --json reader. Both now also push a warn diagnostic with the same code as the log.

3. Upload failure no longer masquerades as a clean local save — DynamicCommand.#record

S3ArtifactStore.upload swallows failures and returns null, which #record couldn't tell apart from "no S3 configured" — both became a tidy "saved locally". It now distinguishes them and emits an UPLOAD diagnostic when a configured upload fails (link missing, local copy kept).

4. ENGINE_FATAL logged as well as diagnosed — so it appears in both channels per the contract.

The logging audit that motivated this (the "any violations?" answer)

Cross-referencing every log.* and Diagnostic.* call site against the codes.ts "same code in both channels" contract:

Class	Codes	Status
Log-only failures (invisible to the envelope)	`RECORD`, `RECORD_EMPTY`, `UPLOAD`	fixed here — now in both channels
Log-only, but surfaces elsewhere	`CHROME` (→ now ENGINE_FATAL via abort path), `LSP` (→ `CODEGRAPH_FAILED` downstream)	acceptable
In both channels (the contract, done right)	`CODEGRAPH_FAILED`, `EMIT`, `ENGINE_FATAL` (now)	✓
Diagnostic-only (no stderr trail)	`STEP_FAILED`, `BP_UNBOUND`, `BP_BOUND_UNHIT`, `GRAPH_TRUNCATED`, `COMPLEXITY_HIGH`, `DEPS_CIRCULAR`, `TOOL_MISSING`	acceptable — the envelope IS the agent's channel; documented
Intentional level split	`EMIT` (log `error`, diagnostic `warn`)	by design: the operation failed, the result is still valid

Also checked and clean: stdout discipline — the logger is stderr-only (logger.ts), and every process.stdout.write in src/ is a legitimate data/human channel write, so --json is never polluted.

Not fixed here (bigger / out of scope, tracked as recommendations): dashboard "age out stale running" UI net, and decoupling the final "done" emit from the S3 upload so a slow upload can't delay the running→done transition.

Tests

test/dynamic-diagnostics.test.js (new) — fake-tracer unit tests: thrown run → terminal envelope (running cleared, ENGINE_FATAL); captured fatal → ok:false; clean-empty node trace → ok:true, not running.
tsc --noEmit clean · npm test → 44/45 pass (1 DB round-trip skipped).

Base

Stacked on fix/collector-emit-bounded-memory (PR #6) — its Cli.ts edit is adjacent to this one's. Retarget to master once #6 lands.

🤖 Generated with Claude Code

…running" A chrome run that captured nothing and recorded nothing was reported as success (ok:true, exit 0) and could sit on the dashboard's "running" badge forever — the agent had no signal it was broken. Root cause: the failure signals lived in stderr logs or dashboard-only state, never in the JSON envelope the agent reads. This closes those gaps and the matching violations of the logger's own two-channel contract (codes.ts: the stderr log and the envelope diagnostic for one event should carry the SAME code). - Terminal envelope on abort. If a run throws (attach failed, engine crashed, recording threw), DynamicCommand now emits a terminal envelope via onProgress — no `running` flag, ok:false, an ENGINE_FATAL diagnostic — so the collector resolves the session instead of leaving its initial "running" partial orphaned. Cli flushes that emit before exiting non-zero. - Recording outcomes are now envelope diagnostics, not just stderr. RECORD_EMPTY (no frames → empty video) and RECORD (render/upload threw) were log-only, so "no video" was invisible to a --json reader. Both now push a warn diagnostic carrying the same code as the log. - Upload failure no longer masquerades as a clean local save. S3ArtifactStore swallows failures and returns null, which #record could not tell apart from "no S3 configured". It now distinguishes the two and emits an UPLOAD diagnostic when a configured upload fails (link missing, local copy kept). - ENGINE_FATAL is now logged as well as diagnosed, so it appears in both channels per the contract. Adds test/dynamic-diagnostics.test.js (fake-tracer unit tests for the abort terminal envelope, captured-fatal, and clean-empty cases). typecheck + build clean; 44/45 tests pass (1 DB round-trip skipped). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ccurately (#6) * fix(collector): bound emit failure memory and word network failures accurately Follow-up to the emit-feedback diagnostics (#5), addressing two Copilot review notes: - Cli.ts kept every failed EmitResult in an array but only ever read the count and the last one. On a run that emits per onProgress, repeated failures grew that array unbounded. Replace it with a counter plus the most recent failure. - The emit diagnostic said the collector "rejected" the emits even when the POST never reached it (connection refused/timeout/DNS — no HTTP status). Word an HTTP rejection and a delivery failure distinctly. - Collector.emit returned the full rejection body, which a caller retains; a large error page could bloat memory. Cap the stored body at 10k chars (the log line still truncates to 500 independently). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(trace): make empty/aborted runs fail loudly instead of silently "running" (#7) A chrome run that captured nothing and recorded nothing was reported as success (ok:true, exit 0) and could sit on the dashboard's "running" badge forever — the agent had no signal it was broken. Root cause: the failure signals lived in stderr logs or dashboard-only state, never in the JSON envelope the agent reads. This closes those gaps and the matching violations of the logger's own two-channel contract (codes.ts: the stderr log and the envelope diagnostic for one event should carry the SAME code). - Terminal envelope on abort. If a run throws (attach failed, engine crashed, recording threw), DynamicCommand now emits a terminal envelope via onProgress — no `running` flag, ok:false, an ENGINE_FATAL diagnostic — so the collector resolves the session instead of leaving its initial "running" partial orphaned. Cli flushes that emit before exiting non-zero. - Recording outcomes are now envelope diagnostics, not just stderr. RECORD_EMPTY (no frames → empty video) and RECORD (render/upload threw) were log-only, so "no video" was invisible to a --json reader. Both now push a warn diagnostic carrying the same code as the log. - Upload failure no longer masquerades as a clean local save. S3ArtifactStore swallows failures and returns null, which #record could not tell apart from "no S3 configured". It now distinguishes the two and emits an UPLOAD diagnostic when a configured upload fails (link missing, local copy kept). - ENGINE_FATAL is now logged as well as diagnosed, so it appears in both channels per the contract. Adds test/dynamic-diagnostics.test.js (fake-tracer unit tests for the abort terminal envelope, captured-fatal, and clean-empty cases). typecheck + build clean; 44/45 tests pass (1 DB round-trip skipped). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(collector): bound the rejection body at the stream, add emit-diagnostic tests Addresses the two Copilot review notes on the PR: - Collector.emit buffered the whole rejection body via response.text() before slicing, so an oversized error page still caused a transient memory spike and download latency even though the stored body was capped. Read only up to MAX_BODY_CHARS off the stream, then cancel it — bounding memory and latency at the source. - Extract the emit-failure diagnostic wording into an exported emitFailureMessage() helper and cover it with focused tests: HTTP status → "rejected", no status (network) → "failed" (the regression guard so a POST that never landed isn't reported as a rejection), plus the no-body / missing-error / oversized-body edge cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(collector): flush the TextDecoder so a split multi-byte char isn't dropped readCappedText decoded chunks with { stream: true } but never did a final decode() flush. A response ending on a multi-byte UTF-8 sequence split across the last chunk boundary would drop/garble that trailing character in the stored body and logged reason. Flush after the read loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…#8) Keep the prose docs in step with the behavior changes from the collector/trace fixes (#5–#7). - README: the docker-compose section showed the container ports (:4747 · :5432 · :9000) instead of the host-published ports the compose file actually maps. Correct to :14747 · :65432 · :19000 (console :19001), and the S3_ENDPOINT example to :19000, so the README matches docker-compose.yml. The native-serve example keeps :5432 (your own Postgres) — that one was right. - skills/trace/SKILL.md: add a "Reading the result" section. The skill is deliberately minimal (the binary is the source of truth), but it gave zero guidance on interpreting a trace — the gap behind an agent treating an empty or perpetually-"running" trace as success. Names the three stable fields that decide whether a run did what was asked (ok · diagnostics[].code · meta.running) and the rule that ok:true + events:[] + a warn diagnostic means "ran fine, saw nothing — re-aim," not success. Field-level (no drift-prone code list).

The DX-gaps analysis enumerated six ways the CLI misled an agent; all six have since shipped. Keep the analysis verbatim as the rationale, but add a status banner + table and a per-gap resolution note pointing to where each was closed (and flagging the few sub-items still open: dashboard age-out of stale "running" (#6b), done/upload decoupling (#6c), and lockfile-based port auto-discovery (#5)). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

burrows99 merged commit 45fc8da into fix/collector-emit-bounded-memory Jun 19, 2026
1 check passed

burrows99 deleted the fix/trace-runs-fail-loudly branch June 19, 2026 07:30

burrows99 mentioned this pull request Jun 19, 2026

docs: sync README host ports + teach the skill to read the result envelope #8

Merged

burrows99 mentioned this pull request Jun 19, 2026

docs: sync README ports, teach the skill to read the envelope, record DX-gap resolutions #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(trace): make empty/aborted runs fail loudly instead of silently "running"#7

fix(trace): make empty/aborted runs fail loudly instead of silently "running"#7
burrows99 merged 1 commit into
fix/collector-emit-bounded-memoryfrom
fix/trace-runs-fail-loudly

burrows99 commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

burrows99 commented Jun 19, 2026

Why

Changes

The logging audit that motivated this (the "any violations?" answer)

Tests

Base

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant